Fast Nearest Neighbors
نویسنده
چکیده
We present a review of the literature on fast nearest neighbors using the basic approach from Karger and Ruhl [4] and a recent technique called cover trees. A small error in Insert procedure from the original paper on cover trees is corrected and an examination of how query time actually varies with the size of the problem is shown using a Python implementation of the basic cover tree algorithms.
منابع مشابه
Evaluation of Fast K-nearest Neighbors Search Methods Using Real Data Sets
The problem of k-nearest neighbors (kNN) search is to find nearest k neighbors from a given data set for a query point. To speed up the finding process of nearest k neighbors, many fast kNN search algorithms were proposed. The performance of fast kNN search algorithms is highly influenced by the number of dimensions, number of data points, and data distribution of a data set. In the extreme cas...
متن کاملFast Large-Scale Approximate Graph Construction for NLP
Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorith...
متن کاملFast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملFast k-nearest neighbors search using modified principal axis search tree
Article history: Available online 2 February 2010
متن کامل